We are going to work with JSON files that come from what are called public APIs (APIs that anyone can interact with). Loading packages:
library("jsonlite")
library("tidyverse")
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ purrr::flatten() masks jsonlite::flatten()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Now let’s move on to JSON files we can access directly from the web,
via a simple public API. Essentially, we are able to input a URL into
the fromJSON function, and read whatever JSON file is
returned. For now, we are just going to query a public API that has a
single non-variable endpoint which returns (in this case) a random fact
about cats:
api_url <- "https://catfact.ninja/fact"
fromJSON(api_url)
## $fact
## [1] "The life expectancy of cats has nearly doubled over the last fifty years."
##
## $length
## [1] 73
Because this is such a simple API, if we wanted to return more
results, we’d need to get a little creative. Note how this time we
include, within our do.call an lapply across
the elements of our output list, converting each element into a tibble
with as_tibble.
api_queryer <- function(results_count = 10, api_url = "https://catfact.ninja/fact"){
run_result <- c()
results <- list()
for(i in 1:results_count){
results[[i]] <- fromJSON(api_url)
Sys.sleep(1)
}
return(results)
}
# run our function -- we can leave the defaults
cat_facts_output <- api_queryer()
# convert to a nicely formatted tibble (though in this case it would be ideal if it were a kibble -- this is a cat joke):
cat_facts <- do.call(rbind, lapply(cat_facts_output, as_tibble, stringsAsFactors = FALSE))
cat_facts
## # A tibble: 10 × 2
## fact length
## <chr> <int>
## 1 If your cat snores, or rolls over on his back to expose his belly, it… 90
## 2 The cat has 500 skeletal muscles (humans have 650). 51
## 3 Baking chocolate is the most dangerous chocolate to your cat. 61
## 4 Two members of the cat family are distinct from all others: the cloud… 354
## 5 Neutering a cat extends its life span by two or three years. 60
## 6 Cats are the world's most popular pets, outnumbering dogs by as many … 84
## 7 A cat rubs against people not only to be affectionate but also to mar… 174
## 8 A cat has the ability to rotate their ears 180 degrees,with the help … 113
## 9 A cat’s jaw can’t move sideways, so a cat can’t chew large chunks of … 74
## 10 Cats see six times better in the dark and at night than humans. 63
Now let’s say we’re working with more intricate APIs, where we need to request API access from the platform. If our request is accepted, then we would get an API key and a client ID (typically). It is usually a condition for getting an API key that it be kept secure and its usage restricted to the individual researcher’s use. That implies that you cannot hardcode your API key into your script and you cannot share it on e.g. Github. But you still need R to know it so you can run and knit your code. The best practice to deal with this is to store your API key information locally in a local environment file.
# The below command create an .Renviron file to store locally on your computer but never share on Github or anywhere
usethis::edit_r_environ()
## ☐ Edit '/Users/poirot/.Renviron'.
## ☐ Restart R for changes to take effect.
Once you’ve created an .Renviron file you can populate it with your very own secret API key and client. It might look something like this:
NAME="Marion Lieutaud"
APIKEY="xxxxxxxxx"
Then save your .Renviron file and restart RStudio Once you’ve done that, you should be able to access your environment data in the following way
Sys.getenv('NAME')
## [1] ""
# This should return "Marion Lieutaud" in my case
Now when
Loading packages:
library("httr")
library("jsonlite")
library("tidyverse")
library("jpeg") #to let us read .jpegs/.jpgs
library("grid") #to let us plot images
Our cat facts API is cute but it’s not terribly useful unless we want to pull “random” objects out of an API pipeline, which probably is not the case. So instead, let’s explore a more useful public API, from the Art Institute of Chicago. This API has multiple models or “resources” (essentially, representations of the underlying data that exist in some relational databases somewhere – more next week), each of which can be queried via three endpoints.
Much like our cat facts API, we can just do a direct call to the base URL, which corresponds in this case to the listings endpoint. Queries to this endpoint return pages from all listings of the AIC collection. In this case, by default get back the first page of results only, and we get a lot of data for each artwork or each artist (depending on which model we query):
artworks_url <- "https://api.artic.edu/api/v1/artworks"
# fromJSON(artworks_url)
artists_url <- "https://api.artic.edu/api/v1/artists"
# fromJSON(artists_url)
Let’s focus, for now, on the artworks model. As we just saw, our query produced a large number of columns (“fields”), many of which we don’t really want or need. Consulting the documentation, and using what we know about the structure of URLs, we see that we can specify fields for our query:
artworks_url_fields <- "https://api.artic.edu/api/v1/artworks?fields=id,title,artist_display,date_display"
fromJSON(artworks_url_fields)
## $pagination
## $pagination$total
## [1] 127616
##
## $pagination$limit
## [1] 12
##
## $pagination$offset
## [1] 0
##
## $pagination$total_pages
## [1] 10635
##
## $pagination$current_page
## [1] 1
##
## $pagination$next_url
## [1] "https://api.artic.edu/api/v1/artworks?page=2&fields=id%2Ctitle%2Cartist_display%2Cdate_display"
##
##
## $data
## id title
## 1 14620 Cliff Walk at Pourville
## 2 15857 Cabaret Scene
## 3 15854 Seated Female Nude
## 4 20684 Paris Street; Rainy Day
## 5 18579 Chickens
## 6 21954 Bird-Shaped Water Dropper
## 7 21893 Bamboo Shoot-Shaped Ewer
## 8 22525 Bird Shaped Ewer with Daoist Priest
## 9 22191 Claudine Resting
## 10 24202 La Java
## 11 24306 Blue and Green Music
## 12 27949 Madame Roulin Rocking the Cradle (La berceuse)
## date_display
## 1 1882
## 2 c. 1920
## 3 c. 1925
## 4 1877
## 5 1933
## 6 Goryeo dynasty (918–1392), mid–12th century
## 7 Goryeo dynasty (918–1392), 12th century
## 8 Goryeo dynasty (918–1392), 12th century
## 9 1913
## 10 1925
## 11 1919–21
## 12 1889
## artist_display
## 1 Claude Monet (French, 1840–1926)
## 2 André Lhote\nFrench, 1885-1962
## 3 André Lhote\nFrench, 1885-1962
## 4 Gustave Caillebotte (French, 1848–1894)
## 5 Edgar Miller\nAmerican, born 1899
## 6 Korea
## 7 Korea
## 8 Korea
## 9 Jules Pascin\nAmerican, born Bulgaria, 1885-1930
## 10 Georges Emile Capon\nFrench, 1890-1980
## 11 Georgia O'Keeffe (American, 1887–1986)
## 12 Vincent van Gogh (Dutch, 1853–1890)
##
## $info
## $info$license_text
## [1] "The `description` field in this response is licensed under a Creative Commons Attribution 4.0 Generic License (CC-By) and the Terms and Conditions of artic.edu. All other data in this response is licensed under a Creative Commons Zero (CC0) 1.0 designation and the Terms and Conditions of artic.edu."
##
## $info$license_links
## [1] "https://creativecommons.org/publicdomain/zero/1.0/"
## [2] "https://www.artic.edu/terms"
##
## $info$version
## [1] "1.13"
##
##
## $config
## $config$iiif_url
## [1] "https://www.artic.edu/iiif/2"
##
## $config$website_url
## [1] "http://www.artic.edu"
Now let’s switch to a different endpoint, the detail
endpoint where we can request information on specific artworks.
We’re sitll using the artworks model, and again we’ll only query
specific fields for the artwork(s) of interest. We’ll start to build
this up in a slightly more principled fashion, using
paste0(), which concatenates strings. Below, the first
string in our paste0() function is the
artworks_url model URL we defined above, the second string
is some required formatting, the third string is the specific
artwork of interest (can you figure out which artwork it
is?), and the third string is the specific set of fields we
want.
# define our fields of interest
fields <- "?fields=id,title,artist_display,date_display"
# provide an artwork to study
artwork <- "28560"
# build the query and retrieve JSON
artwork_detail_url <- paste0(artworks_url, "/", artwork, fields)
fromJSON(artwork_detail_url)
## $data
## $data$id
## [1] 28560
##
## $data$title
## [1] "The Bedroom"
##
## $data$date_display
## [1] "1889"
##
## $data$artist_display
## [1] "Vincent van Gogh (Dutch, 1853–1890)"
##
##
## $info
## $info$license_text
## [1] "The `description` field in this response is licensed under a Creative Commons Attribution 4.0 Generic License (CC-By) and the Terms and Conditions of artic.edu. All other data in this response is licensed under a Creative Commons Zero (CC0) 1.0 designation and the Terms and Conditions of artic.edu."
##
## $info$license_links
## [1] "https://creativecommons.org/publicdomain/zero/1.0/"
## [2] "https://www.artic.edu/terms"
##
## $info$version
## [1] "1.13"
##
##
## $config
## $config$iiif_url
## [1] "https://www.artic.edu/iiif/2"
##
## $config$website_url
## [1] "http://www.artic.edu"
# to show only the data we want
fromJSON(artwork_detail_url)$data
## $id
## [1] 28560
##
## $title
## [1] "The Bedroom"
##
## $date_display
## [1] "1889"
##
## $artist_display
## [1] "Vincent van Gogh (Dutch, 1853–1890)"
The next endpoint is perhaps the most interesting for us: the search endpoint. This allows us to search the model of interest, and return only the data that results from that search. This is great because it lets us narrow down our requests and not overload the AIC’s servers, and because it lets us look for specific types of art (you can imagine how useful this would be in a social science application). In furtherance of our feline API efforts, let’s start by searching for artwork about cats:
# artworks model, search endpoint url:
artworks_search_url <- "https://api.artic.edu/api/v1/artworks/search?q="
# define search terms. we use gsub(" ", "%20", "x") here to replace spaces between search terms with "%20" which is how we often represent spaces in a URL.
search_terms <- gsub(" ", "%20", "cat")
# build the query:
cat_search_url <- paste0(artworks_search_url, search_terms)
fromJSON(cat_search_url)
## $preference
## NULL
##
## $pagination
## $pagination$total
## [1] 7869
##
## $pagination$limit
## [1] 10
##
## $pagination$offset
## [1] 0
##
## $pagination$total_pages
## [1] 787
##
## $pagination$current_page
## [1] 1
##
##
## $data
## _score id api_model api_link
## 1 135.57333 656 artworks https://api.artic.edu/api/v1/artworks/656
## 2 119.93909 117241 artworks https://api.artic.edu/api/v1/artworks/117241
## 3 116.97719 45259 artworks https://api.artic.edu/api/v1/artworks/45259
## 4 104.34673 16227 artworks https://api.artic.edu/api/v1/artworks/16227
## 5 97.81632 22482 artworks https://api.artic.edu/api/v1/artworks/22482
## 6 94.19799 51719 artworks https://api.artic.edu/api/v1/artworks/51719
## 7 92.74848 158921 artworks https://api.artic.edu/api/v1/artworks/158921
## 8 92.40688 119335 artworks https://api.artic.edu/api/v1/artworks/119335
## 9 92.03020 68825 artworks https://api.artic.edu/api/v1/artworks/68825
## 10 89.77347 5522 artworks https://api.artic.edu/api/v1/artworks/5522
## is_boosted title
## 1 FALSE Lion (One of a Pair, South Pedestal)
## 2 FALSE Girl with Cat
## 3 FALSE Nude with Cats
## 4 FALSE Cat Making Up
## 5 FALSE Homesickness
## 6 FALSE Winter: Cat on a Cushion
## 7 FALSE Courtesan Playing with a Cat
## 8 FALSE Baroque Pearl Mounted as a Cat Holding a Mouse
## 9 FALSE The Cats' Rendezvous
## 10 FALSE Cat Coffin
## thumbnail.lqip
## 1 data:image/gif;base64,R0lGODlhCAAFAPUAADY5NTQ/PkI4IT1COjtEQkNHQk5YWFteWWtkTmtoX291WnJ1WX1xZ355cYZ3Zn+Cf42KaoWEeJOKfpeWcoCEgI2OiJaQgp+XgpWRiJWTjKKbjqqhkK+olbGml7q0q7e2sMG5rcbAs8fBuMrGv8/HvsnJud3d3evr7gAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACH5BAAAAAAALAAAAAAIAAUAAAYlQJPnIwKRQpPTyPDgQDaC0gUQIAwsDISjcUgUMB2JpkLJRBSLIAA7
## 2 data:image/gif;base64,R0lGODlhBAAFAPQAACUiHCckHSghGi4nGislHi0mHjMvJDUuJDUwITcwIz07LUY+LUhALktGNExGNFFINGhYOWdcQXVlRYRmSAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACH5BAAAAAAALAAAAAAEAAUAAAUR4EAwC3JMjxJJhgM1QSEkQAgAOw==
## 3 data:image/gif;base64,R0lGODlhBAAFAPQAAFpcVVthVGBgSndvSW5hUWtlWHJvVWxpYHJ1bYd+an2Feo6Da4iGcZmTfqWdi6SgkK+rnLS7q7u4qb68rAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACH5BAAAAAAALAAAAAAEAAUAAAURoBIIQ8QQBlIsDXAkzjNJUAgAOw==
## 4 data:image/gif;base64,R0lGODlhBAAFAPQAACAmNC0wPW1VOTw/SVlWUXBlVnV2doVnQ494WYx8aI+BboWAedipa4iOlYyQlI+UmaabkLKpmta2jKi52wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACH5BAAAAAAALAAAAAAEAAUAAAURoAJNw5IUQEMggWNIwhMxRwgAOw==
## 5 data:image/gif;base64,R0lGODlhBAAFAPQAAJB4cbqIcIp/irSQgrWXhbScjbGZlqqdqqicr66iusCklcGpm7m0zb+409LH0NPF09XJ1tbL1tnM2d7P3wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACH5BAAAAAAALAAAAAAEAAUAAAURYCRBDtNMD5IIxxAARkEoSwgAOw==
## 6 data:image/gif;base64,R0lGODlhBgAFAPQAAEI9JFFFLV5QPWVaOn1VR31rUHhpV3BnXHprWYBhRpJgUIdyU6VqW6pxXKR5WId9b458bcBvZcdwaM14btR3b9V8c9F/eI6Dc6uHeqKSfcORhtiVjdmWjt6ZkAAAAAAAACH5BAAAAAAALAAAAAAGAAUAAAUY4GUgxZI9QgAk2jE4CrNBhFVRHdZMkcSFADs=
## 7 data:image/gif;base64,R0lGODlhAwAFAPMAAK+RdLmlgryjhrCgiL2oj8KymsWzmMq2msvArMvBrNDBrNDGsdjLtNnOtwAAAAAAACH5BAAAAAAALAAAAAADAAUAAAQLMLFmRCvhDECWQhEAOw==
## 8 data:image/gif;base64,R0lGODlhBgAFAPQAAKGJbLOYdbKafMGhdJ6fn6eWgLSgiLCnnKKhoaWjpKakpaelpqenp66wtbKwsbSztLW0tLi3uLe5v7y9vsasjsO6ssK9vMPAvsfBvMXBwMTCwsLCxcjIyO3m3wAAAAAAACH5BAAAAAAALAAAAAAGAAUAAAUYILMoCUJAjxMczXR1QyFtBiUAEYdVVqaFADs=
## 9 data:image/gif;base64,R0lGODlhBAAFAPQAAC0nHFpSRH92Z4V+boyDdJOId6KWh6GYiaacjKidja6lkq6klLKolrmunb6zosO4psK6qMe6qNHEstPItwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACH5BAAAAAAALAAAAAAEAAUAAAURYIQs0zMojQBARxEwhOQkRggAOw==
## 10 data:image/gif;base64,R0lGODlhAwAFAPMAAGlQQXhiVG9yfZF9b4eAgJaKh4KGkKOXk6SZlJKWoaehoaKiqKmorbK5x+Tn8AAAACH5BAAAAAAALAAAAAADAAUAAAQL0BDBRkKhFbDcUREAOw==
## thumbnail.width thumbnail.height
## 1 8430 5620
## 2 6486 7661
## 3 4603 5060
## 4 1645 2250
## 5 1754 2250
## 6 2928 2250
## 7 2039 3800
## 8 2911 2250
## 9 1767 2250
## 10 1500 2250
## thumbnail.alt_text
## 1 A bronze lion, deep green and muscular, looks out in the distance from its pedestal in front of the Art Institute of Chicago.
## 2 A work made of oil on board.
## 3 A work made of oil on cardboard.
## 4 A color woodblock print of two orangeand white cats with pink noses represented in cubist-like design against a blue, gray and black background
## 5 A work made of watercolor and gouache, over touches of graphite, on cream wove paper.
## 6 A work made of lithograph in 6 colors (red, ochre, yellow, black, gray-brown, brown) from two stones, with scraping on stone, on ivory wove paper.
## 7 A work made of hand-colored woodblock print; tan-e, vertical o-oban.
## 8 A work made of gold, enamel, and baroque pearl.
## 9 A work made of lithograph in black on ivory wove paper, laid down on ivory cloth.
## 10 A work made of wood and plaster.
## timestamp
## 1 2025-03-15T23:26:02-05:00
## 2 2025-03-15T22:39:07-05:00
## 3 2025-03-15T22:21:05-05:00
## 4 2025-03-15T22:14:04-05:00
## 5 2025-03-15T22:15:41-05:00
## 6 2025-03-15T22:22:54-05:00
## 7 2025-03-15T22:50:24-05:00
## 8 2025-03-15T22:39:41-05:00
## 9 2025-03-15T23:20:25-05:00
## 10 2025-03-15T22:11:34-05:00
##
## $info
## $info$license_text
## [1] "The `description` field in this response is licensed under a Creative Commons Attribution 4.0 Generic License (CC-By) and the Terms and Conditions of artic.edu. All other data in this response is licensed under a Creative Commons Zero (CC0) 1.0 designation and the Terms and Conditions of artic.edu."
##
## $info$license_links
## [1] "https://creativecommons.org/publicdomain/zero/1.0/"
## [2] "https://www.artic.edu/terms"
##
## $info$version
## [1] "1.13"
##
##
## $config
## $config$iiif_url
## [1] "https://www.artic.edu/iiif/2"
##
## $config$website_url
## [1] "http://www.artic.edu"
What we’ve done above is to use the logic of the AIC API to build
particular URLs of interest, and then query them directly with
fromJSON. Now let’s start writing queries in a slightly
more elegant fashion, using the httr package (you can also
use httr2, which is a newer re-write of httr
that is in early versioning). These packages are high-level interfaces
of curl,
developed for flexible and customisable querying of web resources from
R. For now, we will use the GET function from
httr. Among other things, the GET function
allows us to input the url of interest, an additional
path (which we won’t use in this case), and a detailed
query list which can take as many elements as there are
parameters for our API.
# build the API GET request
cat_search <- GET(artworks_search_url, # the API endpoint of interest
query = list(q = search_terms,
fields = "id,title,artist_display,date_display",
size = 10)) # query allows us to specify parameters, which we find in the API documentation
# parse the content returned from our GET request
json_cat_search <- content(cat_search, "parsed")
# let's inspect our content
json_cat_search
## $preference
## NULL
##
## $pagination
## $pagination$total
## [1] 7869
##
## $pagination$limit
## [1] 10
##
## $pagination$offset
## [1] 0
##
## $pagination$total_pages
## [1] 787
##
## $pagination$current_page
## [1] 1
##
##
## $data
## $data[[1]]
## $data[[1]]$`_score`
## [1] 135.879
##
## $data[[1]]$id
## [1] 656
##
## $data[[1]]$title
## [1] "Lion (One of a Pair, South Pedestal)"
##
## $data[[1]]$date_display
## [1] "1893"
##
## $data[[1]]$artist_display
## [1] "Edward Kemeys (American, 1843–1907)\nAmerican Bronze Founding Company\nChicago"
##
##
## $data[[2]]
## $data[[2]]$`_score`
## [1] 120.4231
##
## $data[[2]]$id
## [1] 117241
##
## $data[[2]]$title
## [1] "Girl with Cat"
##
## $data[[2]]$date_display
## [1] "1937"
##
## $data[[2]]$artist_display
## [1] "Balthus (Baltusz Klossowski de Rola)\nFrench, 1908–2001"
##
##
## $data[[3]]
## $data[[3]]$`_score`
## [1] 116.9664
##
## $data[[3]]$id
## [1] 45259
##
## $data[[3]]$title
## [1] "Nude with Cats"
##
## $data[[3]]$date_display
## [1] "1901"
##
## $data[[3]]$artist_display
## [1] "Pablo Picasso\nSpanish, active France, 1881-1973"
##
##
## $data[[4]]
## $data[[4]]$`_score`
## [1] 104.4088
##
## $data[[4]]$id
## [1] 16227
##
## $data[[4]]$title
## [1] "Cat Making Up"
##
## $data[[4]]$date_display
## [1] "1962"
##
## $data[[4]]$artist_display
## [1] "Inagaki Tomoo\nJapanese, 1902–1980"
##
##
## $data[[5]]
## $data[[5]]$`_score`
## [1] 98.11973
##
## $data[[5]]$id
## [1] 22482
##
## $data[[5]]$title
## [1] "Homesickness"
##
## $data[[5]]$date_display
## [1] "c. 1948"
##
## $data[[5]]$artist_display
## [1] "René Magritte\nBelgian, 1898-1967"
##
##
## $data[[6]]
## $data[[6]]$`_score`
## [1] 94.25226
##
## $data[[6]]$id
## [1] 51719
##
## $data[[6]]$title
## [1] "Winter: Cat on a Cushion"
##
## $data[[6]]$date_display
## [1] "1909"
##
## $data[[6]]$artist_display
## [1] "Théophile-Alexandre Steinlen\nFrench, born Switzerland, 1859-1923"
##
##
## $data[[7]]
## $data[[7]]$`_score`
## [1] 93.10331
##
## $data[[7]]$id
## [1] 158921
##
## $data[[7]]$title
## [1] "Courtesan Playing with a Cat"
##
## $data[[7]]$date_display
## [1] "c. 1715"
##
## $data[[7]]$artist_display
## [1] "Kaigetsudo Dohan\nJapanese, active c. 1704-16"
##
##
## $data[[8]]
## $data[[8]]$`_score`
## [1] 92.47557
##
## $data[[8]]$id
## [1] 119335
##
## $data[[8]]$title
## [1] "Baroque Pearl Mounted as a Cat Holding a Mouse"
##
## $data[[8]]$date_display
## [1] "17th century"
##
## $data[[8]]$artist_display
## [1] "Spanish or south German"
##
##
## $data[[9]]
## $data[[9]]$`_score`
## [1] 92.40161
##
## $data[[9]]$id
## [1] 68825
##
## $data[[9]]$title
## [1] "The Cats' Rendezvous"
##
## $data[[9]]$date_display
## [1] "1868"
##
## $data[[9]]$artist_display
## [1] "Édouard Manet\nFrench, 1832-1883"
##
##
## $data[[10]]
## $data[[10]]$`_score`
## [1] 90.12851
##
## $data[[10]]$id
## [1] 9372
##
## $data[[10]]$title
## [1] "The Large Cat"
##
## $data[[10]]$date_display
## [1] "1657"
##
## $data[[10]]$artist_display
## [1] "Cornelis Visscher \nDutch, c. 1629-1658"
##
##
##
## $info
## $info$license_text
## [1] "The `description` field in this response is licensed under a Creative Commons Attribution 4.0 Generic License (CC-By) and the Terms and Conditions of artic.edu. All other data in this response is licensed under a Creative Commons Zero (CC0) 1.0 designation and the Terms and Conditions of artic.edu."
##
## $info$license_links
## $info$license_links[[1]]
## [1] "https://creativecommons.org/publicdomain/zero/1.0/"
##
## $info$license_links[[2]]
## [1] "https://www.artic.edu/terms"
##
##
## $info$version
## [1] "1.13"
##
##
## $config
## $config$iiif_url
## [1] "https://www.artic.edu/iiif/2"
##
## $config$website_url
## [1] "http://www.artic.edu"
# not so useful! so let's see what we got in a slightly easier way...
names(json_cat_search)
## [1] "preference" "pagination" "data" "info" "config"
# $data is what we want. so let's use do.call, rbind, and lapply to extract all the data from our returned content, and format it as a tidy tibble
cat_art <- do.call(rbind, lapply(json_cat_search$data, as_tibble, stringsAsFactors = FALSE)) %>%
select(- '_score') # removing the search score, but you can keep it if interesting to you
# let's look at our tibble
cat_art
## # A tibble: 10 × 4
## id title date_display artist_display
## <int> <chr> <chr> <chr>
## 1 656 Lion (One of a Pair, South Pedestal) 1893 "Edward Kemey…
## 2 117241 Girl with Cat 1937 "Balthus (Bal…
## 3 45259 Nude with Cats 1901 "Pablo Picass…
## 4 16227 Cat Making Up 1962 "Inagaki Tomo…
## 5 22482 Homesickness c. 1948 "René Magritt…
## 6 51719 Winter: Cat on a Cushion 1909 "Théophile-Al…
## 7 158921 Courtesan Playing with a Cat c. 1715 "Kaigetsudo D…
## 8 119335 Baroque Pearl Mounted as a Cat Holding a … 17th century "Spanish or s…
## 9 68825 The Cats' Rendezvous 1868 "Édouard Mane…
## 10 9372 The Large Cat 1657 "Cornelis Vis…
So far we have used the AIC API to extract information about the
collection and its artworks. That’s nice, but there’s more interesting
things we can do. The AIC supports a second – different – API that
allows us to download .jpeg copies of their artwork. We’re
now going to learn how to download and visualise these images in R.
First, we have to retrieve from the default API (but using the images model) the image id (not the same as the artwork id!) for the pieces of interest. Then we can query the alternative API to retrieve the actual images.
# query the API:
cat_image_search <- GET(artworks_search_url, # the API endpoint of interest
query = list(q = search_terms,
fields = "title, image_id",
size = 1)) # query allows us to specify parameters, which we find in the API documentation
json_cat_image_search <- content(cat_image_search, "parsed")
# directly extract the image id (as we are just working with one request, we don't need to worry about flattening the data)
cat_image_id <- json_cat_image_search$data[[1]]$image_id
# now, we introduce our alternative API, the AIC's IIIF (International Image Interoperability Framework) API
iiif_url <- "https://www.artic.edu/iiif/2"
# using our iiif_url and our cat_image_id, plus some formatting as provided by the AIC API documentation, we get
iiif_url_artwork <- paste0(iiif_url, "/", cat_image_id, "/full/843,/0/default.jpg")
# assign an empty temporary file to store our downloaded image in this R session (in a moment we will save these locally, when we do a retrieve of images)
temp <- tempfile()
# download the file from our API URL
download.file(iiif_url_artwork, temp, mode="wb")
#Reading the file from the temp object
image_to_plot <- readJPEG(temp)
class(image_to_plot)
## [1] "array"
# plot our image, using ggplot (can also use base R)
ggplot() +
annotation_custom(rasterGrob(image_to_plot), xmin=-Inf, xmax=Inf, ymin=-Inf, ymax=Inf) +
theme_void() +
theme(plot.margin = unit(rep(0, 4), "null"))
Finally, let’s build a piece of code for requesting and plotting artwork from the AIC, using any set of search terms we want.
# let's build a function
art_image_search <- function(search_term, n_images = 5, output_dir = "temp_images", clear_directory = TRUE, plot_images = TRUE) {
search_term <- gsub(" ", "%20", search_term)
images_search_url <- "https://api.artic.edu/api/v1/artworks/search?q="
images_search_out <- GET(images_search_url, # the API endpoint of interest
query = list(q = search_term,
fields = "id, title, artist_display, image_id",
size = n_images)) # query allows us to specify parameters, which we find in the API documentation
json_images_search_out <- content(images_search_out, "parsed")
# replace NULL values with NA values
json_images_search_out$data <- eval(parse(text = gsub("NULL", "NA", deparse(json_images_search_out$data))))
image_ids <- do.call(rbind, lapply(json_images_search_out$data, as_tibble, stringsAsFactors = FALSE)) %>%
dplyr::select('id', 'title', 'artist_display', 'image_id')
# we now check if our output directory exists. if not, we create it. if it does and we want to clear the directory, we do so. else, proceed.
if (!dir.exists(paste0("./",output_dir))) {
dir.create(paste0("./",output_dir))
} else if(dir.exists(paste0("./",output_dir)) & clear_directory == TRUE) {
unlink(paste0("./",output_dir), recursive = TRUE, force = TRUE)
dir.create(paste0("./",output_dir))
} else {}
# now move to image API query
iiif_url <- "https://www.artic.edu/iiif/2"
# now work through the image ids, with api queries:
for(i in 1:nrow(image_ids)){
file <- paste0("./", output_dir, "/", image_ids$id[i], ".jpg")
# try() here allows our request to fail without interrupting the run
try(download.file(paste0(iiif_url, "/", image_ids$image_id[i], "/full/843,/0/default.jpg"),
file, mode="wb"))
# take a breath
Sys.sleep(1)
}
# enumerate our successfully downloaded files
downloads <- list.files(paste0("./", output_dir))
# now, if we want to plot images, we save them to a list of ggplots
if (plot_images == TRUE){
images <- list()
for(j in 1:length(downloads)){
image_to_plot <- readJPEG(paste0("./", output_dir,"/", downloads[j]))
id <- gsub(".jpg", "", downloads[j])
artist <- image_ids$artist_display[image_ids$id==id]
title <- image_ids$title[image_ids$id==id]
title_for_image <- paste0(title, " by ", artist)
images[[j]] <- ggplot() +
annotation_custom(rasterGrob(image_to_plot), xmin=-Inf, xmax=Inf, ymin=-Inf, ymax=Inf) +
ggtitle(str_wrap(title_for_image, 80)) +
theme_void()
}
# return the object
return(images)
} else {}
}
modern_art_images <- art_image_search("modern art", 10)
modern_art_images
## [[1]]
##
## [[2]]
##
## [[3]]
##
## [[4]]
##
## [[5]]
##
## [[6]]